Skip to content

Add periodic cache refresh timer#20

Merged
jdoss merged 1 commit intomasterfrom
feat/cache-refresh-timer
Apr 8, 2026
Merged

Add periodic cache refresh timer#20
jdoss merged 1 commit intomasterfrom
feat/cache-refresh-timer

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 8, 2026

Summary

  • New psi-{provider}-setup.timer generated alongside the setup service, triggers the same unit on cache.refresh_interval (default 1h). Today only psi-infisical-setup.timer is emitted — nitrokeyhsm is local-only.
  • Two new CacheConfig fields: refresh_interval and refresh_randomized_delay. Both accept systemd time strings.
  • psi systemd install writes and enables the timer automatically when the cache is configured. Works for both native and container modes.
  • Docs updated in docs/secret-cache.md, README.md, and CLAUDE.md.

Why

Before this PR, PSI populated the cache at boot and on lookup miss, and operators could run psi cache refresh manually. A secret rotated in Infisical between reboots stayed stale until someone intervened or the host restarted. Option (a) from the cache-sync design discussion was the lowest-effort fix: an external systemd timer that kicks the existing setup unit. This matches the existing pattern for psi-tls-renew.timer and keeps serve code untouched.

What changes

psi/settings.py

  • CacheConfig.refresh_interval: str = "1h"
  • CacheConfig.refresh_randomized_delay: str = "5m"

Both are passed straight through to systemd, so any valid systemd time string works (30m, 2h, 1d, etc.).

psi/unitgen.py

  • generate_provider_setup_timer(provider, interval, randomized_delay) — builds a .timer unit targeting psi-{provider}-setup.service. Uses OnBootSec + OnUnitActiveSec + Persistent=true so the schedule survives reboots and missed intervals.
  • provider_supports_refresh(provider) — returns True only for providers whose setup path talks to a remote. Set of one today: {"infisical"}.

psi/installer.py

  • New _write_refresh_timers helper emits the timer for each refreshable provider. Returns early if cache.enabled is False or cache.backend is None — no point scheduling refreshes if PSI is not caching.
  • _install_native and _install_container both call it. Timers live under the systemd unit dir in both modes (timers are plain units regardless of whether the triggered service comes from a quadlet).
  • _enable_units gains a refresh_timers kwarg and enables each timer with systemctl enable --now when --enable is passed.

Tests

  • tests/test_unitgen.py: TestProviderRefreshSupport (3 tests) and TestProviderSetupTimer (5 tests) covering the target unit, interval/delay pass-through, Persistent=true, the [Install] section, and the description text.
  • tests/test_installer.py: TestWriteRefreshTimers (5 tests) covering the happy path, cache disabled, no backend configured, nitrokeyhsm-only providers, and custom interval.

Docs

  • docs/secret-cache.md: rewrites the rotation section into "Rotation and periodic refresh" with three subsections — scheduled refresh (timer details, tuning via systemctl edit), manual refresh (CLI commands), and on-miss refill (what the serve process already does). Adds refresh_interval and refresh_randomized_delay to the configuration example.
  • README.md: updates the cache config example to include the new fields, adds a paragraph about the auto-generated timer, adds psi-infisical-setup.timer to the FCOS generator list, and notes in the CLI reference that psi cache refresh is only needed for out-of-band rotations.
  • CLAUDE.md: same CLI-reference note.

Test plan

  • uv run ruff check psi/ tests/ — clean
  • uv run ruff format --check psi/ tests/ — clean
  • uv run ty check — clean
  • uv run pytest -q — 309 passed (13 new tests)
  • On a test server: pull the new image, regenerate quadlets, confirm psi-infisical-setup.timer exists and systemctl list-timers shows the next run at refresh_interval from boot
  • Rotate a secret in Infisical, wait for the next scheduled refresh, confirm the new value is served via podman exec psi-secrets psi cache status --verify and an actual container lookup

Cache miss refill covers new secrets, and psi cache refresh covers
manual rotation, but a secret rotated upstream between reboots stayed
stale until an operator intervened or the host restarted. Add a
systemd timer generated alongside the provider setup unit that
re-runs psi-{provider}-setup.service on a configurable cadence so
rotations propagate without manual action.

CacheConfig gets refresh_interval (default 1h) and
refresh_randomized_delay (default 5m) — systemd time strings that
map directly to OnUnitActiveSec and RandomizedDelaySec on the
generated timer. The timer uses OnBootSec + OnUnitActiveSec +
Persistent=true so the schedule resumes correctly after downtime.

The timer is only emitted when the cache is enabled with a backend
AND the provider is in the refreshable set. Today that is just
infisical; nitrokeyhsm is local-only and has nothing to re-fetch.
The same helper runs for native and container-mode installs since
timers are plain systemd units regardless of whether the service
they trigger comes from a quadlet or a handwritten .service file.

Operators can override the cadence per host without editing config
via systemctl edit psi-infisical-setup.timer — documented in the
rotation section of docs/secret-cache.md.
@jdoss jdoss merged commit ef2a684 into master Apr 8, 2026
2 checks passed
jdoss added a commit that referenced this pull request Apr 8, 2026
PR #20 generated psi-{provider}-setup.timer pointing directly at the
existing setup unit, but that never fired more than once. The setup
service uses Type=oneshot + RemainAfterExit=yes so ActiveEnterTimestamp
is set once and never updates, and OnUnitActiveSec on the timer is
anchored to that frozen timestamp. Once the first fire happens,
'next fire' = ActiveEnterTimestamp + interval — already in the past,
so systemd sets next_elapse to infinity and the timer never re-arms.

Worse, even if it did fire again, systemctl start on a oneshot that is
currently active (exited) is a no-op, so the cache would not update.

Fix: generate a tiny wrapper psi-{provider}-refresh.service (plain
oneshot, no RemainAfterExit) that does:
    ExecStart=/usr/bin/systemctl restart psi-{provider}-setup.service

Point the timer at the wrapper. The wrapper's ActiveEnterTimestamp
moves forward every run, OnUnitActiveSec re-arms correctly, and
systemctl restart on the setup unit does re-run its ExecStart and
repopulate the cache.

Verified end-to-end on a test host with a 2 minute interval: the
second and third scheduled runs both rewrote cache.enc and the timer
showed NEXT/LEFT for the next cycle each time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant